Empirical data on corpus design and usage in biomedical natural language processing
نویسندگان
چکیده
This paper describes the design of six publicly available biomedical corpora. We then present usage data for the six corpora. We show that corpora that are carefully annotated with respect to structural and linguistic characteristics and that are distributed in standard formats are more widely used than corpora that are not. These findings have implications for the design of the next generation of biomedical corpora.
منابع مشابه
Corpus Design For Biomedical Natural Language Processing
This paper classifies six publicly available biomedical corpora according to various corpus design features and characteristics. We then present usage data for the six corpora. We show that corpora that are carefully annotated with respect to structural and linguistic characteristics and that are distributed in standard formats are more widely used than corpora that are not. These findings have...
متن کاملCorpus based coreference resolution for Farsi text
"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...
متن کاملUltra-Fast Image Reconstruction of Tomosynthesis Mammography Using GPU
Digital Breast Tomosynthesis (DBT) is a technology that creates three dimensional (3D) images of breast tissue. Tomosynthesis mammography detects lesions that are not detectable with other imaging systems. If image reconstruction time is in the order of seconds, we can use Tomosynthesis systems to perform Tomosynthesis-guided Interventional procedures. This research has been designed to study u...
متن کاملDevelopment of a Greek biomedical corpus
Collection and annotation of specialized corpora, for less-spoken languages such as Greek, is crucial endeavour for the development and growth of the language technology research for these languages. This paper presents the design and compilation of a biomedical corpus that took place in the framework of the national R&D project “IATROLEXI” (http://www.iatrolexi.gr). The aim of IATROLEXI is to ...
متن کاملBootstrapping a Romanian Corpus for Medical Named Entity Recognition
Named Entity Recognition (NER) is an important component of natural language processing (NLP), with applicability in the biomedical domain, enabling knowledge discovery from medical texts. Due to the fact that for the Romanian language there are only a few linguistic resources specific to the biomedical domain, we have created a sub-corpus specific to this domain. In this paper we present a new...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- AMIA ... Annual Symposium proceedings. AMIA Symposium
دوره شماره
صفحات -
تاریخ انتشار 2005